Search CORE

42 research outputs found

Discrete Distribution Estimation under User-level Local Differential Privacy

Author: Acharya Jayadev
Liu Yuhan
Sun Ziteng
Publication venue
Publication date: 07/11/2022
Field of study

We study discrete distribution estimation under user-level local differential privacy (LDP). In user-level

\varepsilon

-LDP, each user has

m\ge1

samples and the privacy of all

m

samples must be preserved simultaneously. We resolve the following dilemma: While on the one hand having more samples per user should provide more information about the underlying distribution, on the other hand, guaranteeing the privacy of all

m

samples should make the estimation task more difficult. We obtain tight bounds for this problem under almost all parameter regimes. Perhaps surprisingly, we show that in suitable parameter regimes, having

m

samples per user is equivalent to having

m

times more users, each with only one sample. Our results demonstrate interesting phase transitions for

m

and the privacy parameter

\varepsilon

in the estimation risk. Finally, connecting with recent results on shuffled DP, we show that combined with random shuffling, our algorithm leads to optimal error guarantees (up to logarithmic factors) under the central model of user-level DP in certain parameter regimes. We provide several simulations to verify our theoretical findings.Comment: 26 pages, 4 figure

arXiv.org e-Print Archive

The importance of feature preprocessing for differentially private linear optimization

Author: Menon Aditya Krishna
Sun Ziteng
Suresh Ananda Theertha
Publication venue
Publication date: 19/07/2023
Field of study

Training machine learning models with differential privacy (DP) has received increasing interest in recent years. One of the most popular algorithms for training differentially private models is differentially private stochastic gradient descent (DPSGD) and its variants, where at each step gradients are clipped and combined with some noise. Given the increasing usage of DPSGD, we ask the question: is DPSGD alone sufficient to find a good minimizer for every dataset under privacy constraints? As a first step towards answering this question, we show that even for the simple case of linear classification, unlike non-private optimization, (private) feature preprocessing is vital for differentially private optimization. In detail, we first show theoretically that there exists an example where without feature preprocessing, DPSGD incurs a privacy error proportional to the maximum norm of features over all samples. We then propose an algorithm called DPSGD-F, which combines DPSGD with feature preprocessing and prove that for classification tasks, it incurs a privacy error proportional to the diameter of the features

\max_{x, x' \in D} \|x - x'\|_2

. We then demonstrate the practicality of our algorithm on image classification benchmarks

arXiv.org e-Print Archive

Concentration Bounds for Discrete Distribution Estimation in KL Divergence

Author: Canonne Clément L.
Sun Ziteng
Suresh Ananda Theertha
Publication venue
Publication date: 14/02/2023
Field of study

We study the problem of discrete distribution estimation in KL divergence and provide concentration bounds for the Laplace estimator. We show that the deviation from mean scales as

\sqrt{k}/n

when

n \ge k

, improving upon the best prior result of

k/n

. We also establish a matching lower bound that shows that our bounds are tight up to polylogarithmic factors

arXiv.org e-Print Archive

Unified lower bounds for interactive high-dimensional estimation under information constraints

Author: Acharya Jayadev
Canonne Clément L.
Sun Ziteng
Tyagi Himanshu
Publication venue
Publication date: 12/07/2021
Field of study

We consider the task of distributed parameter estimation using interactive protocols subject to local information constraints such as bandwidth limitations, local differential privacy, and restricted measurements. We provide a unified framework enabling us to derive a variety of (tight) minimax lower bounds for different parametric families of distributions, both continuous and discrete, under any

\ell_p

loss. Our lower bound framework is versatile and yields "plug-and-play" bounds that are widely applicable to a large range of estimation problems. In particular, our approach recovers bounds obtained using data processing inequalities and Cram\'er--Rao bounds, two other alternative approaches for proving lower bounds in our setting of interest. Further, for the families considered, we complement our lower bounds with matching upper bounds.Comment: Significant improvements: handle sparse parameter estimation, simplify and generalize argument

arXiv.org e-Print Archive

Subset-Based Instance Optimality in Private Estimation

Author: Dick Travis
Kulesza Alex
Sun Ziteng
Suresh Ananda Theertha
Publication venue
Publication date: 15/08/2023
Field of study

We propose a new definition of instance optimality for differentially private estimation algorithms. Our definition requires an optimal algorithm to compete, simultaneously for every dataset

D

, with the best private benchmark algorithm that (a) knows

D

in advance and (b) is evaluated by its worst-case performance on large subsets of

D

. That is, the benchmark algorithm need not perform well when potentially extreme points are added to

D

; it only has to handle the removal of a small number of real data points that already exist. This makes our benchmark significantly stronger than those proposed in prior work. We nevertheless show, for real-valued datasets, how to construct private algorithms that achieve our notion of instance optimality when estimating a broad class of dataset properties, including means, quantiles, and

\ell_p

-norm minimizers. For means in particular, we provide a detailed analysis and show that our algorithm simultaneously matches or exceeds the asymptotic performance of existing algorithms under a range of distributional assumptions

arXiv.org e-Print Archive